{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Why need feature importance?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Defined objective ->> Levers *(what inputs can we control, intersection of levers and random forest important features)* ->> Data ->> Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* What is causing something to happen? Why is that happening? For all these questions, feature interpretetion is importance.  \n",
    "* We use **simulation** to predict the outcome of things that can happen from our predictions.    \n",
    "    * E.g. of simulation -> What would be change in Jeremy's behaviour as customer if we send a this customer service guy to him?  \n",
    "    \n",
    "* Some industries measure the success on AUC-ROC, which is different than finding what would be the outcome of it  \n",
    "    * E.g. about tree interpretor. In a hospital, we predicted that this patient is likely to be readmitted in 2 weeks. But main aim of model is to tell what can we do about it? for e.g. this patient is highly likely to come back to hospital, but what we are interested in is what can we do about this patient now? Maybe he is highly likely to come back because of his high BP, so fix that now.  \n",
    "\n",
    "*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*** -- thought cloud -- ***\n",
    "\n",
    "* If you building "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Feature importance\n",
    "\n",
    "How is it calculated? \n",
    "\n",
    "* Let our original more score is s1 -> Randomly shuffle a feature f1 -> make prediction on same model with only f1 shuffled -> new score s2 -> importance is s2 - s1\n",
    "\n",
    "For e.g. if some feature is 1000 times more important than any other feature, then focus on using that one variable. \n",
    "\n",
    "* Look at feature importance line plot. See where it flatten out. \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tree interpretor\n",
    "\n",
    "e.g. \n",
    "\n",
    "10(mean/bias) -**route1**-> 9.4(contribution = 10-9.4 = **-0.6**) -**route2**-> 9.7(contribution = 9.7-9.4 = **0.3**) -**route3**-> 9.1(contribution = 9.7-9.1 = **0.6**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interaction term contributions\n",
    "\n",
    "e.g. in above tree interpretor case, **contribution of rout1(*)route2 = 9.7-10 = -0.3**.\n",
    "\n",
    "Intercation here means splitting on some branch (don't think about multiplication, addition or ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Coding for feature importance'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.ensemble import RandomForestRegressor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "x = np.linspace(0,1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y = x + np.random.uniform(-.2, .2, x.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "plt.scatter(x,y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(50, 1)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[:, None].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(50, 1)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[...,None].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1, 50)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[None,:].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "x_trn, x_tst = "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# creating random forest\n",
    "m = RandomForestRegressor().fit(x_trn, y_trn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*** -- thought cloud -- ***\n",
    "\n",
    "* "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
